Functional gene clustering via gene annotation sentences, MeSH and GO keywords from biomedical literature

نویسندگان

  • Jeyakumar Natarajan
  • Jawahar Ganapathy
چکیده

Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to gene-related information at different levels of biological organization, granularity and data format. This information is being used to assess and interpret the results from high-throughput experiments. To improve keyword extraction for annotational clustering and other types of analyses, we have developed a novel text mining approach, which is based on keywords identified at the level of gene annotation sentences (in particular sentences characterizing biological function) instead of entire abstracts. Further, to improve the expressiveness and usefulness of gene annotation terms, we investigated the combination of sentence-level keywords with terms from the Medical Subject Headings (MeSH) and Gene Ontology (GO) resources. We find that sentence-level keywords combined with MeSH terms outperforms the typical 'baseline' set-up (term frequencies at the level of abstracts) by a significant margin, whereas the addition of GO terms improves matters only marginally. We validated our approach on the basis of a manually annotated corpus of 200 abstracts generated on the basis of 2 cancer categories and 10 genes per category. We applied the method in the context of three sets of differentially expressed genes obtained from pediatric brain tumor samples. This analysis suggests novel interpretations of discovered gene expression patterns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MeSH annotation of the chicken genome : 1 MeSH - informed enrichment analysis and 2 MeSH - guided semantic similarity among 3 functional terms and gene products

26 Biomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation 27 of gene products is mainly accelerated by Gene Ontology (GO) and more recently by Medical Sub28 ject Headings (MeSH). Here we report the MeSH annotation of the chicken genome and illustrate 29 some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and 30 M...

متن کامل

The Alignment of the Medical Subject Headings to the Gene Ontology and Its Application in Gene Annotation

The Gene Ontology (GO) is a controlled vocabulary used for annotation of genes. Assigning such terms to uncategorized genes is timeconsuming work, and a recurring task in biomedicine. The biomedical citations of the literature database MEDLINE are indexed with terms from the Medical Subject Headings (MeSH). We studied whether MeSH terms from gene-related MEDLINE entries could be translated to G...

متن کامل

MeSH-Informed Enrichment Analysis and MeSH-Guided Semantic Similarity Among Functional Terms and Gene Products in Chicken

Biomedical vocabularies and ontologies aid in recapitulating biological knowledge. The annotation of gene products is mainly accelerated by Gene Ontology (GO), and more recently by Medical Subject Headings (MeSH). Here, we report a suite of MeSH packages for chicken in Bioconductor, and illustrate some features of different MeSH-based analyses, including MeSH-informed enrichment analysis and Me...

متن کامل

Development and application of an interaction network ontology for literature mining of vaccine-associated gene-gene interactions

BACKGROUND Literature mining of gene-gene interactions has been enhanced by ontology-based name classifications. However, in biomedical literature mining, interaction keywords have not been carefully studied and used beyond a collection of keywords. METHODS In this study, we report the development of a new Interaction Network Ontology (INO) that classifies >800 interaction keywords and incorp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformation

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2007